Questioning the influence of roots: supplementary data

Setup and data preparation

1. Raw data exploration

Table 1: Summary of taxonomic occurrences and sampling effort
Biological Data
Sampling Effort
Taxon Total occurrences Sample images Mean occurrence per image
acari 538 272 1.98
collembola 1548 391 3.96
gastropoda 633 263 2.41
Note:
Total unique images analyzed: 495
Total scanner units: 8
Total period units: 7
Figure 1: Spatio-temporal sampling effort : total images analyzed per scanner and period
(a) Rows: Biological and Environmental metrics | Columns: Spatial replicates (Scanners)
Figure 2: Replicate-level variability of studied metrics

1. The rhizosphere effect: attraction and spatial signature

Do soil fauna actively select the rhizosphere space ?

Why use Z-scores instead of simple differences?

To evaluate whether soil fauna actively select the rhizosphere, we must distinguish between biological attraction and spatial chance. We utilized a standardized selection index (\(z\)-score) calculated as:

\[z = \frac{Observed - Expected}{SD_{expected}}\]

While a simple arithmetic difference (\(Observed - Expected\)) might seem intuitive, it is statistically flawed for comparing different habitats due to scaling bias.

The “randomness” of an animal’s position is dictated by the underlying geometry of the habitat. This complexity is captured by the Integrated Spatial SD (\(SD_{expected}\)). If we rely on unnormalized differences, we encounter two major issues:

  • Scale Sensitivity: A 1cm deviation is a massive biological signal in a dense, low-SD root system, but merely “statistical noise” in a sparse, high-SD environment.

  • Heteroscedasticity: As shown in the “Unnormalized Signal” plot, the variance of the raw difference expands as spatial uncertainty increases. This “fan shape” indicates that the metric’s reliability is inconsistent across the dataset.

By dividing the deviation by the \(SD_{expected}\), we effectively “flatten” the spatial noise. The \(z\)-score normalization ensures that the ‘attraction’ metric is relative to the local spatial probability.

Figure 3: Normalization comparison

The “Normalized Signal” plot demonstrates that the \(z\)-score remains stable across the entire range of spatial complexities. This process filters out stochastic noise, ensuring that a high negative \(z\)-score represents a consistent biological “pull” toward the root, regardless of whether the habitat is simple or complex.

Defining the rhizosphere width

Les résultats semblent mauvais donc à voir si on conserve.

2. The geometric niche: habitat availability constraint

Are fauna close to the root by choice (attraction) or necessity (habitat saturation) ?

To isolate active biological selection from this geometric necessity, we analyzed the \(z\)-score across a gradient of root density.

In our experimental setup, root density within a specific scanner over a 7-day period is often stable. This creates a statistical challenge: if we model every individual occurrence, we artificially inflate the degrees of freedom for a predictor (root density) that isn’t actually varying at that scale. To resolve this and provide a conservative estimate of the “Geometric Niche” effect, we performed a sensitivity analysis by comparing two modeling approaches:

  • Individual-Level Model: High resolution, but potentially biased by intra-replicate autocorrelation.
  • Aggregated-Level Model: Collapsing data to Scanner × Period × Taxon x Root type means. This ensures that each data point represents a unique “habitat density state.”

Visualizing the Aggregation Logic The visualization below confirms that aggregation preserves the experimental gradient while “flattening” the overrepresented low-density zones created by individual-level clusters.

(a) Grey: individual observations | Orange: Replicate averages (aggregated)
Figure 4: Structural impact of data aggregation

This “vertical striping” seen in Figure A indicates that many fauna occurence share identical root density values, creating pseudoreplication that can lead to Type I errors (false significance). As shown in Figure B, the aggregation process preserves the overall experimental gradient of root density while smoothing over-represented low-density zones.

                     df      AIC
mod_gaussian   18.92717 398.7315
mod_aggregated 23.06470 381.1369
Figure 5

The selection of the scaled \(t\)-family (scat()) was necessitated by the distributional characteristics of the \(z\)-scores. Preliminary modeling using a standard Gaussian distribution showed significant deviation in the residual QQ-plots, with pronounced “heavy tails” indicating that extreme observations (outliers) were more frequent than predicted by a Normal distribution.

Model Sensitivity Analysis We implemented two Generalized Additive Models (GAMs) using the scat() family (scaled \(t\)-distribution) to account for heavy-tailed residuals often found in ecological count-derived data.

Table 2: Sensitivity analysis - comparison of individual vs. aggregated modeling approaches
Model Approach AIC Dev. Expl. (%) Adj. R² Mean EDF Significant Smooths N
Individual 15579.328 2.661 0.027 3.281 6 / 8 5438
Aggregated 381.137 9.441 0.038 1.663 0 / 8 258

The individual model identifies 6 out of 8 smooth terms as significant. However, this significance is deceptive; with a deviance explained of only 2.6%, the model has almost no predictive power. The low p-values are likely a mathematical byproduct of the large sample size and high spatial autocorrelation (pseudoreplication). The aggregated model, by collapsing the data to replicate means, eliminate individual-level stochasticity. While the “significance” vanishes (0/8 terms), the deviance explained increases nearly four-fold (9.4%). Furthermore, the Average EDF (Effective Degrees of Freedom) drops from ~3.28 to ~1.66, indicating that the model is no longer trying to “wiggle” through noise, but is instead finding a simpler, more stable (though non-significant) trend.

(a) Negative z-scores (red) indicate active biological attraction persisting through habitat saturation
Figure 6: Final sensitivity model: aggregated response

The aggregated model provides a more conservative and honest estimate of the “geometric niche.” It demonstrates that, once individual-level noise is removed, there is no evidence for a strong non-linear selection effect across the root density gradient.